Dataset statistics
| Number of variables | 13 |
|---|---|
| Number of observations | 29101 |
| Missing cells | 3043 |
| Missing cells (%) | 0.8% |
| Duplicate rows | 0 |
| Duplicate rows (%) | 0.0% |
| Total size in memory | 2.9 MiB |
| Average record size in memory | 104.0 B |
Variable types
| NUM | 10 |
|---|---|
| CAT | 2 |
| BOOL | 1 |
Reproduction
| Analysis started | 2021-12-05 15:56:23.501755 |
|---|---|
| Analysis finished | 2021-12-05 15:56:32.413725 |
| Duration | 8.91 seconds |
| Version | pandas-profiling v2.8.0 |
| Command line | pandas_profiling --config_file config.yaml [YOUR_FILE.csv] |
| Download configuration | config.yaml |
pickup_dt has a high cardinality: 4343 distinct values | High cardinality |
borough has 3043 (10.5%) missing values | Missing |
pickup_dt is uniformly distributed | Uniform |
borough is uniformly distributed | Uniform |
pickups has 5567 (19.1%) zeros | Zeros |
spd has 3596 (12.4%) zeros | Zeros |
dewp has 303 (1.0%) zeros | Zeros |
pcp01 has 26468 (91.0%) zeros | Zeros |
pcp06 has 23460 (80.6%) zeros | Zeros |
pcp24 has 18631 (64.0%) zeros | Zeros |
sd has 20167 (69.3%) zeros | Zeros |
| Distinct count | 4343 |
|---|---|
| Unique (%) | 14.9% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 227.4 KiB |
| 2015-01-01 01:00:00 | 7 |
|---|---|
| 2015-04-26 10:00:00 | 7 |
| 2015-04-26 14:00:00 | 7 |
| 2015-04-26 15:00:00 | 7 |
| 2015-04-26 16:00:00 | 7 |
| Other values (4338) |
| Value | Count | Frequency (%) | |
| 2015-01-01 01:00:00 | 7 | < 0.1% | |
| 2015-04-26 10:00:00 | 7 | < 0.1% | |
| 2015-04-26 14:00:00 | 7 | < 0.1% | |
| 2015-04-26 15:00:00 | 7 | < 0.1% | |
| 2015-04-26 16:00:00 | 7 | < 0.1% | |
| 2015-04-26 17:00:00 | 7 | < 0.1% | |
| 2015-04-26 18:00:00 | 7 | < 0.1% | |
| 2015-04-26 19:00:00 | 7 | < 0.1% | |
| 2015-04-26 20:00:00 | 7 | < 0.1% | |
| 2015-04-26 21:00:00 | 7 | < 0.1% | |
| Other values (4333) | 29031 | 99.8% |
Length
| Max length | 19 |
|---|---|
| Median length | 19 |
| Mean length | 19 |
| Min length | 19 |
| Distinct count | 6 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 3043 |
| Missing (%) | 10.5% |
| Memory size | 227.4 KiB |
| Bronx | |
|---|---|
| Brooklyn | |
| EWR | |
| Manhattan | |
| Queens |
| Value | Count | Frequency (%) | |
| Bronx | 4343 | 14.9% | |
| Brooklyn | 4343 | 14.9% | |
| EWR | 4343 | 14.9% | |
| Manhattan | 4343 | 14.9% | |
| Queens | 4343 | 14.9% | |
| Staten Island | 4343 | 14.9% | |
| (Missing) | 3043 | 10.5% |
Length
| Max length | 13 |
|---|---|
| Median length | 6 |
| Mean length | 6.880210302 |
| Min length | 3 |
| Distinct count | 3406 |
|---|---|
| Unique (%) | 11.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 490.2159032335659 |
|---|---|
| Minimum | 0 |
| Maximum | 7883 |
| Zeros | 5567 |
| Zeros (%) | 19.1% |
| Memory size | 227.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 1 |
| median | 54 |
| Q3 | 449 |
| 95-th percentile | 2840 |
| Maximum | 7883 |
| Range | 7883 |
| Interquartile range (IQR) | 448 |
Descriptive statistics
| Standard deviation | 995.6495355 |
|---|---|
| Coefficient of variation (CV) | 2.031042912 |
| Kurtosis | 9.26766556 |
| Mean | 490.2159032 |
| Median Absolute Deviation (MAD) | 54 |
| Skewness | 2.976238116 |
| Sum | 14265773 |
| Variance | 991317.9975 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 5567 | 19.1% | |
| 1 | 2656 | 9.1% | |
| 2 | 1698 | 5.8% | |
| 3 | 937 | 3.2% | |
| 4 | 474 | 1.6% | |
| 5 | 257 | 0.9% | |
| 6 | 128 | 0.4% | |
| 36 | 85 | 0.3% | |
| 45 | 84 | 0.3% | |
| 32 | 81 | 0.3% | |
| Other values (3396) | 17134 | 58.9% |
| Value | Count | Frequency (%) | |
| 0 | 5567 | 19.1% | |
| 1 | 2656 | 9.1% | |
| 2 | 1698 | 5.8% | |
| 3 | 937 | 3.2% | |
| 4 | 474 | 1.6% |
| Value | Count | Frequency (%) | |
| 7883 | 1 | < 0.1% | |
| 7801 | 1 | < 0.1% | |
| 7711 | 1 | < 0.1% | |
| 7512 | 1 | < 0.1% | |
| 7271 | 1 | < 0.1% |
| Distinct count | 114 |
|---|---|
| Unique (%) | 0.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 5.98492418031781 |
|---|---|
| Minimum | 0.0 |
| Maximum | 21.0 |
| Zeros | 3596 |
| Zeros (%) | 12.4% |
| Memory size | 227.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 3 |
| median | 6 |
| Q3 | 8 |
| 95-th percentile | 13 |
| Maximum | 21 |
| Range | 21 |
| Interquartile range (IQR) | 5 |
Descriptive statistics
| Standard deviation | 3.699007242 |
|---|---|
| Coefficient of variation (CV) | 0.6180541525 |
| Kurtosis | 0.4192409725 |
| Mean | 5.98492418 |
| Median Absolute Deviation (MAD) | 2 |
| Skewness | 0.4190693213 |
| Sum | 174167.2786 |
| Variance | 13.68265458 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 5 | 3816 | 13.1% | |
| 0 | 3596 | 12.4% | |
| 6 | 3545 | 12.2% | |
| 3 | 3432 | 11.8% | |
| 7 | 3021 | 10.4% | |
| 8 | 2574 | 8.8% | |
| 9 | 1592 | 5.5% | |
| 10 | 1369 | 4.7% | |
| 11 | 936 | 3.2% | |
| 13 | 557 | 1.9% | |
| Other values (104) | 4663 | 16.0% |
| Value | Count | Frequency (%) | |
| 0 | 3596 | 12.4% | |
| 0.6 | 21 | 0.1% | |
| 0.75 | 42 | 0.1% | |
| 1 | 66 | 0.2% | |
| 1.2 | 21 | 0.1% |
| Value | Count | Frequency (%) | |
| 21 | 7 | < 0.1% | |
| 20 | 32 | 0.1% | |
| 18 | 71 | 0.2% | |
| 17.5 | 7 | < 0.1% | |
| 17 | 120 | 0.4% |
vsb
Real number (ℝ≥0)
| Distinct count | 179 |
|---|---|
| Unique (%) | 0.6% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 8.818124896706218 |
|---|---|
| Minimum | 0.0 |
| Maximum | 10.0 |
| Zeros | 6 |
| Zeros (%) | < 0.1% |
| Memory size | 227.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 2.575 |
| Q1 | 9.1 |
| median | 10 |
| Q3 | 10 |
| 95-th percentile | 10 |
| Maximum | 10 |
| Range | 10 |
| Interquartile range (IQR) | 0.9 |
Descriptive statistics
| Standard deviation | 2.442897359 |
|---|---|
| Coefficient of variation (CV) | 0.2770313856 |
| Kurtosis | 2.898539633 |
| Mean | 8.818124897 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | -2.042058313 |
| Sum | 256616.2526 |
| Variance | 5.967747505 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 10 | 21578 | 74.1% | |
| 9.1 | 1137 | 3.9% | |
| 8 | 845 | 2.9% | |
| 7 | 780 | 2.7% | |
| 6 | 560 | 1.9% | |
| 4 | 403 | 1.4% | |
| 5 | 395 | 1.4% | |
| 3 | 267 | 0.9% | |
| 0.3 | 127 | 0.4% | |
| 2.833333333 | 108 | 0.4% | |
| Other values (169) | 2901 | 10.0% |
| Value | Count | Frequency (%) | |
| 0 | 6 | < 0.1% | |
| 0.3 | 127 | 0.4% | |
| 0.3333333333 | 6 | < 0.1% | |
| 0.3666666667 | 7 | < 0.1% | |
| 0.4 | 20 | 0.1% |
| Value | Count | Frequency (%) | |
| 10 | 21578 | 74.1% | |
| 9.775 | 7 | < 0.1% | |
| 9.7 | 20 | 0.1% | |
| 9.55 | 72 | 0.2% | |
| 9.333333333 | 14 | < 0.1% |
temp
Real number (ℝ≥0)
| Distinct count | 295 |
|---|---|
| Unique (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 47.669042055501286 |
|---|---|
| Minimum | 2.0 |
| Maximum | 89.0 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 227.4 KiB |
Quantile statistics
| Minimum | 2 |
|---|---|
| 5-th percentile | 18 |
| Q1 | 32 |
| median | 46 |
| Q3 | 64.5 |
| 95-th percentile | 79.5 |
| Maximum | 89 |
| Range | 87 |
| Interquartile range (IQR) | 32.5 |
Descriptive statistics
| Standard deviation | 19.81496901 |
|---|---|
| Coefficient of variation (CV) | 0.4156779359 |
| Kurtosis | -1.037412126 |
| Mean | 47.66904206 |
| Median Absolute Deviation (MAD) | 16 |
| Skewness | 0.05575251227 |
| Sum | 1387216.793 |
| Variance | 392.6329968 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 37 | 675 | 2.3% | |
| 36 | 595 | 2.0% | |
| 35 | 548 | 1.9% | |
| 42 | 531 | 1.8% | |
| 38 | 529 | 1.8% | |
| 34 | 514 | 1.8% | |
| 27 | 508 | 1.7% | |
| 61 | 502 | 1.7% | |
| 41 | 494 | 1.7% | |
| 39 | 494 | 1.7% | |
| Other values (285) | 23711 | 81.5% |
| Value | Count | Frequency (%) | |
| 2 | 20 | 0.1% | |
| 3 | 14 | < 0.1% | |
| 4 | 85 | 0.3% | |
| 5 | 33 | 0.1% | |
| 6 | 41 | 0.1% |
| Value | Count | Frequency (%) | |
| 89 | 28 | 0.1% | |
| 88 | 56 | 0.2% | |
| 87 | 54 | 0.2% | |
| 86 | 61 | 0.2% | |
| 85 | 144 | 0.5% |
| Distinct count | 305 |
|---|---|
| Unique (%) | 1.0% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 30.823064908586023 |
|---|---|
| Minimum | -16.0 |
| Maximum | 73.0 |
| Zeros | 303 |
| Zeros (%) | 1.0% |
| Memory size | 227.4 KiB |
Quantile statistics
| Minimum | -16 |
|---|---|
| 5-th percentile | -2 |
| Q1 | 14 |
| median | 30 |
| Q3 | 50 |
| 95-th percentile | 64.66666667 |
| Maximum | 73 |
| Range | 89 |
| Interquartile range (IQR) | 36 |
Descriptive statistics
| Standard deviation | 21.28344434 |
|---|---|
| Coefficient of variation (CV) | 0.6905038288 |
| Kurtosis | -1.035223571 |
| Mean | 30.82306491 |
| Median Absolute Deviation (MAD) | 18 |
| Skewness | 0.0154181971 |
| Sum | 896982.0119 |
| Variance | 452.9850028 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 39 | 578 | 2.0% | |
| 22 | 525 | 1.8% | |
| 18 | 518 | 1.8% | |
| 56 | 508 | 1.7% | |
| 61 | 492 | 1.7% | |
| 25 | 486 | 1.7% | |
| 40 | 473 | 1.6% | |
| 20 | 465 | 1.6% | |
| 10 | 460 | 1.6% | |
| 60 | 450 | 1.5% | |
| Other values (295) | 24146 | 83.0% |
| Value | Count | Frequency (%) | |
| -16 | 34 | 0.1% | |
| -15 | 27 | 0.1% | |
| -13 | 86 | 0.3% | |
| -12 | 98 | 0.3% | |
| -11 | 121 | 0.4% |
| Value | Count | Frequency (%) | |
| 73 | 7 | < 0.1% | |
| 72 | 14 | < 0.1% | |
| 71.5 | 12 | < 0.1% | |
| 71.25 | 7 | < 0.1% | |
| 71 | 70 | 0.2% |
slp
Real number (ℝ≥0)
| Distinct count | 413 |
|---|---|
| Unique (%) | 1.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 1017.81793752792 |
|---|---|
| Minimum | 991.4 |
| Maximum | 1043.4 |
| Zeros | 0 |
| Zeros (%) | 0.0% |
| Memory size | 227.4 KiB |
Quantile statistics
| Minimum | 991.4 |
|---|---|
| 5-th percentile | 1005.3 |
| Q1 | 1012.5 |
| median | 1018.2 |
| Q3 | 1022.9 |
| 95-th percentile | 1030 |
| Maximum | 1043.4 |
| Range | 52 |
| Interquartile range (IQR) | 10.4 |
Descriptive statistics
| Standard deviation | 7.76879558 |
|---|---|
| Coefficient of variation (CV) | 0.007632794917 |
| Kurtosis | 0.06914463865 |
| Mean | 1017.817938 |
| Median Absolute Deviation (MAD) | 5.2 |
| Skewness | 0.05284461782 |
| Sum | 29619519.8 |
| Variance | 60.35418476 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 1020 | 269 | 0.9% | |
| 1020.5 | 255 | 0.9% | |
| 1019.9 | 237 | 0.8% | |
| 1020.9 | 227 | 0.8% | |
| 1021.1 | 226 | 0.8% | |
| 1022.7 | 214 | 0.7% | |
| 1020.2 | 213 | 0.7% | |
| 1020.3 | 209 | 0.7% | |
| 1020.7 | 204 | 0.7% | |
| 1021.2 | 204 | 0.7% | |
| Other values (403) | 26843 | 92.2% |
| Value | Count | Frequency (%) | |
| 991.4 | 7 | < 0.1% | |
| 991.6 | 7 | < 0.1% | |
| 992.3 | 7 | < 0.1% | |
| 992.9 | 7 | < 0.1% | |
| 993.4 | 7 | < 0.1% |
| Value | Count | Frequency (%) | |
| 1043.4 | 7 | < 0.1% | |
| 1043.3 | 7 | < 0.1% | |
| 1043.2 | 7 | < 0.1% | |
| 1043.1 | 7 | < 0.1% | |
| 1042.9 | 6 | < 0.1% |
| Distinct count | 80 |
|---|---|
| Unique (%) | 0.3% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.003830149021224929 |
|---|---|
| Minimum | 0.0 |
| Maximum | 0.28 |
| Zeros | 26468 |
| Zeros (%) | 91.0% |
| Memory size | 227.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0.02 |
| Maximum | 0.28 |
| Range | 0.28 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.01893306515 |
|---|---|
| Coefficient of variation (CV) | 4.943166713 |
| Kurtosis | 87.81998828 |
| Mean | 0.003830149021 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 8.220954559 |
| Sum | 111.4611667 |
| Variance | 0.0003584609559 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 26468 | 91.0% | |
| 0.01 | 439 | 1.5% | |
| 0.02 | 181 | 0.6% | |
| 0.03 | 147 | 0.5% | |
| 0.005 | 141 | 0.5% | |
| 0.05 | 115 | 0.4% | |
| 0.003333333333 | 95 | 0.3% | |
| 0.04 | 79 | 0.3% | |
| 0.015 | 78 | 0.3% | |
| 0.06 | 75 | 0.3% | |
| Other values (70) | 1283 | 4.4% |
| Value | Count | Frequency (%) | |
| 0 | 26468 | 91.0% | |
| 0.0025 | 40 | 0.1% | |
| 0.003333333333 | 95 | 0.3% | |
| 0.005 | 141 | 0.5% | |
| 0.006666666667 | 62 | 0.2% |
| Value | Count | Frequency (%) | |
| 0.28 | 21 | 0.1% | |
| 0.2675 | 7 | < 0.1% | |
| 0.26 | 7 | < 0.1% | |
| 0.2533333333 | 7 | < 0.1% | |
| 0.25 | 7 | < 0.1% |
| Distinct count | 318 |
|---|---|
| Unique (%) | 1.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.02612874128036837 |
|---|---|
| Minimum | 0.0 |
| Maximum | 1.24 |
| Zeros | 23460 |
| Zeros (%) | 80.6% |
| Memory size | 227.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0 |
| 95-th percentile | 0.1875 |
| Maximum | 1.24 |
| Range | 1.24 |
| Interquartile range (IQR) | 0 |
Descriptive statistics
| Standard deviation | 0.09312533965 |
|---|---|
| Coefficient of variation (CV) | 3.564095899 |
| Kurtosis | 47.35606139 |
| Mean | 0.02612874128 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 5.936438429 |
| Sum | 760.3725 |
| Variance | 0.008672328884 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 23460 | 80.6% | |
| 0.01 | 841 | 2.9% | |
| 0.02 | 258 | 0.9% | |
| 0.03 | 177 | 0.6% | |
| 0.05 | 152 | 0.5% | |
| 0.005 | 121 | 0.4% | |
| 0.04 | 110 | 0.4% | |
| 0.06 | 95 | 0.3% | |
| 0.08 | 92 | 0.3% | |
| 0.003333333333 | 87 | 0.3% | |
| Other values (308) | 3708 | 12.7% |
| Value | Count | Frequency (%) | |
| 0 | 23460 | 80.6% | |
| 0.0025 | 60 | 0.2% | |
| 0.003333333333 | 87 | 0.3% | |
| 0.005 | 121 | 0.4% | |
| 0.006666666667 | 33 | 0.1% |
| Value | Count | Frequency (%) | |
| 1.24 | 6 | < 0.1% | |
| 1.22 | 6 | < 0.1% | |
| 1.21 | 7 | < 0.1% | |
| 1.083 | 7 | < 0.1% | |
| 1.018 | 7 | < 0.1% |
| Distinct count | 484 |
|---|---|
| Unique (%) | 1.7% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 0.09046437121290214 |
|---|---|
| Minimum | 0.0 |
| Maximum | 2.1 |
| Zeros | 18631 |
| Zeros (%) | 64.0% |
| Memory size | 227.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 0.05 |
| 95-th percentile | 0.5755 |
| Maximum | 2.1 |
| Range | 2.1 |
| Interquartile range (IQR) | 0.05 |
Descriptive statistics
| Standard deviation | 0.2194022017 |
|---|---|
| Coefficient of variation (CV) | 2.425288528 |
| Kurtosis | 16.22082135 |
| Mean | 0.09046437121 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 3.605783873 |
| Sum | 2632.603667 |
| Variance | 0.04813732611 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 18631 | 64.0% | |
| 0.01 | 1259 | 4.3% | |
| 0.05 | 367 | 1.3% | |
| 0.02 | 306 | 1.1% | |
| 0.09 | 271 | 0.9% | |
| 0.08333333333 | 256 | 0.9% | |
| 0.06 | 225 | 0.8% | |
| 0.08 | 217 | 0.7% | |
| 0.1793333333 | 152 | 0.5% | |
| 0.03 | 149 | 0.5% | |
| Other values (474) | 7268 | 25.0% |
| Value | Count | Frequency (%) | |
| 0 | 18631 | 64.0% | |
| 0.0025 | 72 | 0.2% | |
| 0.003333333333 | 91 | 0.3% | |
| 0.005 | 87 | 0.3% | |
| 0.005833333333 | 107 | 0.4% |
| Value | Count | Frequency (%) | |
| 2.1 | 13 | < 0.1% | |
| 1.89 | 7 | < 0.1% | |
| 1.503833333 | 64 | 0.2% | |
| 1.493833333 | 7 | < 0.1% | |
| 1.49 | 7 | < 0.1% |
| Distinct count | 421 |
|---|---|
| Unique (%) | 1.4% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Infinite | 0 |
| Infinite (%) | 0.0% |
| Mean | 2.5291692438976896 |
|---|---|
| Minimum | 0.0 |
| Maximum | 19.0 |
| Zeros | 20167 |
| Zeros (%) | 69.3% |
| Memory size | 227.4 KiB |
Quantile statistics
| Minimum | 0 |
|---|---|
| 5-th percentile | 0 |
| Q1 | 0 |
| median | 0 |
| Q3 | 2.958333333 |
| 95-th percentile | 12.16666667 |
| Maximum | 19 |
| Range | 19 |
| Interquartile range (IQR) | 2.958333333 |
Descriptive statistics
| Standard deviation | 4.520325424 |
|---|---|
| Coefficient of variation (CV) | 1.787276765 |
| Kurtosis | 1.313944097 |
| Mean | 2.529169244 |
| Median Absolute Deviation (MAD) | 0 |
| Skewness | 1.589743978 |
| Sum | 73601.35417 |
| Variance | 20.43334194 |
Histogram with fixed size bins (bins=10)
| Value | Count | Frequency (%) | |
| 0 | 20167 | 69.3% | |
| 8 | 1934 | 6.6% | |
| 11 | 362 | 1.2% | |
| 12 | 345 | 1.2% | |
| 9 | 334 | 1.1% | |
| 7 | 182 | 0.6% | |
| 1 | 181 | 0.6% | |
| 13 | 180 | 0.6% | |
| 2 | 175 | 0.6% | |
| 0.75 | 40 | 0.1% | |
| Other values (411) | 5201 | 17.9% |
| Value | Count | Frequency (%) | |
| 0 | 20167 | 69.3% | |
| 0.04166666667 | 19 | 0.1% | |
| 0.04166666667 | 13 | < 0.1% | |
| 0.08333333333 | 13 | < 0.1% | |
| 0.08333333333 | 19 | 0.1% |
| Value | Count | Frequency (%) | |
| 19 | 7 | < 0.1% | |
| 18.95833333 | 7 | < 0.1% | |
| 18.91666667 | 6 | < 0.1% | |
| 18.875 | 6 | < 0.1% | |
| 18.83333333 | 7 | < 0.1% |
hday
Boolean
| Distinct count | 2 |
|---|---|
| Unique (%) | < 0.1% |
| Missing | 0 |
| Missing (%) | 0.0% |
| Memory size | 227.4 KiB |
| N | |
|---|---|
| Y | 1121 |
| Value | Count | Frequency (%) | |
| N | 27980 | 96.1% | |
| Y | 1121 | 3.9% |
Pearson's r
The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.
Spearman's ρ
The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.
Kendall's τ
Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.
Phik (φk)
Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.First rows
| pickup_dt | borough | pickups | spd | vsb | temp | dewp | slp | pcp01 | pcp06 | pcp24 | sd | hday | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 0 | 2015-01-01 01:00:00 | Bronx | 152 | 5.0 | 10.0 | 30.0 | 7.0 | 1023.5 | 0.0 | 0.0 | 0.0 | 0.0 | Y |
| 1 | 2015-01-01 01:00:00 | Brooklyn | 1519 | 5.0 | 10.0 | 30.0 | 7.0 | 1023.5 | 0.0 | 0.0 | 0.0 | 0.0 | Y |
| 2 | 2015-01-01 01:00:00 | EWR | 0 | 5.0 | 10.0 | 30.0 | 7.0 | 1023.5 | 0.0 | 0.0 | 0.0 | 0.0 | Y |
| 3 | 2015-01-01 01:00:00 | Manhattan | 5258 | 5.0 | 10.0 | 30.0 | 7.0 | 1023.5 | 0.0 | 0.0 | 0.0 | 0.0 | Y |
| 4 | 2015-01-01 01:00:00 | Queens | 405 | 5.0 | 10.0 | 30.0 | 7.0 | 1023.5 | 0.0 | 0.0 | 0.0 | 0.0 | Y |
| 5 | 2015-01-01 01:00:00 | Staten Island | 6 | 5.0 | 10.0 | 30.0 | 7.0 | 1023.5 | 0.0 | 0.0 | 0.0 | 0.0 | Y |
| 6 | 2015-01-01 01:00:00 | NaN | 4 | 5.0 | 10.0 | 30.0 | 7.0 | 1023.5 | 0.0 | 0.0 | 0.0 | 0.0 | Y |
| 7 | 2015-01-01 02:00:00 | Bronx | 120 | 3.0 | 10.0 | 30.0 | 6.0 | 1023.0 | 0.0 | 0.0 | 0.0 | 0.0 | Y |
| 8 | 2015-01-01 02:00:00 | Brooklyn | 1229 | 3.0 | 10.0 | 30.0 | 6.0 | 1023.0 | 0.0 | 0.0 | 0.0 | 0.0 | Y |
| 9 | 2015-01-01 02:00:00 | EWR | 0 | 3.0 | 10.0 | 30.0 | 6.0 | 1023.0 | 0.0 | 0.0 | 0.0 | 0.0 | Y |
Last rows
| pickup_dt | borough | pickups | spd | vsb | temp | dewp | slp | pcp01 | pcp06 | pcp24 | sd | hday | |
|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
| 29091 | 2015-06-30 22:00:00 | Manhattan | 4452 | 5.0 | 10.0 | 76.0 | 64.0 | 1011.9 | 0.0 | 0.0 | 0.0 | 0.0 | N |
| 29092 | 2015-06-30 22:00:00 | Queens | 556 | 5.0 | 10.0 | 76.0 | 64.0 | 1011.9 | 0.0 | 0.0 | 0.0 | 0.0 | N |
| 29093 | 2015-06-30 22:00:00 | Staten Island | 2 | 5.0 | 10.0 | 76.0 | 64.0 | 1011.9 | 0.0 | 0.0 | 0.0 | 0.0 | N |
| 29094 | 2015-06-30 23:00:00 | Bronx | 67 | 7.0 | 10.0 | 75.0 | 65.0 | 1011.8 | 0.0 | 0.0 | 0.0 | 0.0 | N |
| 29095 | 2015-06-30 23:00:00 | Brooklyn | 990 | 7.0 | 10.0 | 75.0 | 65.0 | 1011.8 | 0.0 | 0.0 | 0.0 | 0.0 | N |
| 29096 | 2015-06-30 23:00:00 | EWR | 0 | 7.0 | 10.0 | 75.0 | 65.0 | 1011.8 | 0.0 | 0.0 | 0.0 | 0.0 | N |
| 29097 | 2015-06-30 23:00:00 | Manhattan | 3828 | 7.0 | 10.0 | 75.0 | 65.0 | 1011.8 | 0.0 | 0.0 | 0.0 | 0.0 | N |
| 29098 | 2015-06-30 23:00:00 | Queens | 580 | 7.0 | 10.0 | 75.0 | 65.0 | 1011.8 | 0.0 | 0.0 | 0.0 | 0.0 | N |
| 29099 | 2015-06-30 23:00:00 | Staten Island | 0 | 7.0 | 10.0 | 75.0 | 65.0 | 1011.8 | 0.0 | 0.0 | 0.0 | 0.0 | N |
| 29100 | 2015-06-30 23:00:00 | NaN | 3 | 7.0 | 10.0 | 75.0 | 65.0 | 1011.8 | 0.0 | 0.0 | 0.0 | 0.0 | N |